home *** CD-ROM | disk | FTP | other *** search
- Path: ibm32.perftech.com!usenet
- From: murf@perftech.com (John Murphy)
- Newsgroups: comp.lang.c
- Subject: Re: Sorting large files
- Date: 30 Jan 1996 16:38:44 GMT
- Organization: Performance Technology Inc
- Message-ID: <4elhik$cpq@ibm32.perftech.com>
- References: <4e8j9b$cuf@longwood.cs.ucf.edu>
- NNTP-Posting-Host: k5zba.perftech.com
- Mime-Version: 1.0
- NNTP-Posting-User: REVCO
- X-Newsreader: WinVN 0.93.11
-
- In article <4e8j9b$cuf@longwood.cs.ucf.edu>, schnitzi@longwood.cs.ucf.edu
- says...
- >
- >There was some discussion here a little while
- >back on how to sort the lines in a large file
- >without having to have a huge character array.
- >I suggested using ftell and fseek to hunt down
- >the particular lines you are comparing. Only
- >just the other day did I notice (via a web
- >search engine) that someone posted a question
- >on how this could be done... So here goes.
- >
- >
- >This program sorts stdin to stdout, so it would
- >have to be run like this:
- >
- > sort < infile > outfile
- >
- >It should be easy enough to modify to use file
- >pointers, though. It also requires a 'long'
- >for every line in the file, but this beats the
- >heck out of a huge character array. I've
- >written it here to handle up to 5000-line files.
- >
- >
- >#include <stdio.h>
- >
- >main()
- >{
- > long index[5000], z;
- >
- > char line1[ 256 ], line2[ 256 ];
- >
- > int i, j, count=0;
- >
- > do
- > {
- > index[ count++ ] = ftell( stdin );
- > }
- > while ( gets(line1) );
- >
- > for ( i=0; i<count-1; i++ )
- > {
- > for ( j=i+1; j<count; j++ )
- > {
- > fseek( stdin, index[i], 0 );
- > gets( line1 );
- > fseek( stdin, index[j], 0 );
- > gets( line2 );
- >
- > if ( strcmp( line1, line2 ) > 0 )
- > {
- > z = index[i];
- > index[i] = index[j];
- > index[j] = z;
- > }
- > }
- > }
- >
- > for ( i=0; i<count; i++ )
- > {
- > fseek( stdin, index[i], 0 );
- > gets( line1 );
- > puts( line1 );
- > }
- >}
- >
- Who says computer people don't have a sense of humor?!
-
- Least any neophytes miss the joke and take this as a serious solution to the
- sorting problem, perhaps we should point out that while this code will
- compile and run and even (eventually) sort a file, it's practical use will
- be limited by the MTBF of your equipment and/or your life expectancy ---
- it's SLOOOW! Any file I/O operation is time consuming, and depending on the
- file system, random file I/O can be VERY time consuming; with MS-DOS, for
- instance, reading a large file backwards is an N-squared operation.
-
- As a sanity check on the practicality of this method of sorting, try sorting
- a 20,000 byte file; that's a file that would require the same amount of
- memory as the array of 5000 longs. On one of my systems, using this code to
- sort a file of 250 80 bytes records took 17:30 minutes; sorting the same
- file with MS-DOS SORT (a very slow sort program) took 0.8 seconds.
-
- The trick to sorting large files in a reasonable amount of time is to
- minimize the file I/O. As usual, see Knuth for details...
-
- Murf
-
-